Asynchronous Runtime for Task-Based Dataflow Programming Models
ثبت نشده
چکیده
The importance of parallel programming is increasing year after year since the power wall popularized multi-core processors, and with them, shared memory parallel programming models. In particular, task-based programming models, like the standard OpenMP 4.0, have become more and more important. They allow describing a set of data dependences per task that the runtime uses to order the execution of tasks. This order is calculated using shared graphs, which are updated by all threads but in exclusive access using synchronization mechanisms (locks) to ensure the dependences correctness. Although exclusive accesses are necessary to avoid data race conditions, those may imply contention that limits the application parallelism. This becomes critical in many-core systems because several threads may be wasting computation resources waiting to access the runtime structures. This master thesis introduces the concept of an asynchronous runtime management suitable for task-based programming model runtimes. The runtime proposal is based on the asynchronous management of the runtime structures like task dependence graphs. Therefore, the application threads request actions to the runtime instead of directly executing the needed modifications. The requests are then handled by a runtime manager which can be implemented in different ways. This master thesis presents an extension to a previously implemented centralized runtime manager and presents a novel implementation of a distributed runtime manager. On one hand, the runtime design based on a centralized manager [1] is extended to dynamically adapt the runtime behavior according to the manager load with the objective of being as fast as possible. On the other hand, a novel runtime design based on a distributed manager implementation is proposed to overcome the limitations observed in the centralized design. The distributed runtime implementation allows any thread to become a runtime manager thread if it helps to exploit the application parallelism. That is achieved using a new runtime feature, also implemented in this master thesis, for runtime functionality dispatching through a callback system. The proposals are evaluated in different many-core architectures and their performance is compared against the baseline runtimes used to implement the asynchronous versions. Results show that the centralized manager extension can overcome the hard limitations of the initial basic implementation, that the distributed manager fixes the observed problems in previous implementation, and the proposed asynchronous organization significantly outperforms the speedup obtained by the original runtime for real benchmarks. iii
منابع مشابه
Cores as Functional Units: A Task-Based, Out-of-Order, Dataflow Pipeline
The shift towards on-chip parallelism brings forth an effort to design intuitive parallel programming models that can be used by common programmers. In that context, dataflow programming models show promise for their simplicity and potential performance gains. However, dataflow models require complex runtime support as it is infeasible to statically identify all data dependencies at compile tim...
متن کاملDynamic Verification for Hybrid Concurrent Programming Models
We present a dynamic verification technique for a class of concurrent programming models that combine dataflow and shared memory programming. In this class of hybrid concurrency models, programs are built from tasks whose data dependencies are explicitly defined by a programmer and used by the runtime system to coordinate task execution. Differently from pure dataflow, tasks are allowed to have...
متن کاملDynamic Path Contraction for Distributed, Dynamic Dataflow Languages
We present a work in progress report on applying deforestation to distributed, dynamic dataflow programming models. We propose a novel algorithm, dynamic path contraction, that applies and reverses optimizations to a distributed dataflow application as the program executes. With this algorithm, data and control flow is tracked by the runtime system used to identify potential optimizations as th...
متن کاملAutomatic Code Generation for an Asynchronous Task-based Runtime
Hardware scaling considerations associated with the quest for exascale and extreme scale computing are driving system designers to consider event-driven-task (EDT)-oriented execution models for executing on deep hardware hierarchies. Further, for performance, productivity, and code sustainability reasons, there is an increasing demand for autoparallelizing compiler technologies to automatically...
متن کاملNetwork Algebra for Asynchronous Dataflow∗
Network algebra is proposed as a uniform algebraic framework for the description and analysis of dataflow networks. An equational theory of networks, called BNA (Basic Network Algebra), is presented. BNA, which is essentially a part of the algebra of flownomials, captures the basic algebraic properties of networks. For asynchronous dataflow networks, additional constants and axioms are given; a...
متن کامل